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Abstract 

The fundamental problem of our interest here is soft MIMO detection. We propose a method, 
referred to as subspace marginalization with interference suppression (SUMIS), that yields unprece- 
O > dented performance at low and fixed (deterministic) complexity. Our method provides a well-defined 

tradeoff between computational complexity and performance. Apart from an initial sorting step 
consisting of selecting channel-matrix columns, the algorithm involves no searching nor algorithmic 
branching; hence the algorithm has a completely predictable run-time, and it is readily and massively 

m ■ 

■ parallelizable. We present and assess numerically how SUMIS works in different practical settings: 
full/partial channel state information, sequential/iterative decoding, and low/high rate outer codes. 

o: 

We also comment on how the SUMIS method performs when the number of transmit antennas 



X 



grows. 



I. Introduction 

We consider multiple-input multiple-output (MIMO) systems, which are known to sub- 
stantially increase both the spectral efficiency in rich scattering environments [CD and the 
link robustness. A major difficulty in the implementation of MIMO systems is the signal 
separation (detection) problem, which is generally computationally expensive to solve. This 
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problem can be especially pronounced in large MEMO systems 0, 0. The main reason 
for why MIMO detection is difficult is the occurrence of ill-conditioned MIMO channels. 
For instance, the complexity of the optimal detector, which computes the log-likelihood 
ratio (LLR) values exactly and therefore solves the MIMO detection problem optimally, 
grows exponentially with the number of transmit antennas and polynomially with the size 
of the signal constellation. Suboptimal and fast methods, such as zero-forcing perform well 
only for well-conditioned channels. Hence, for instance if nature provides an ill-conditioned 
channel that does not change for several transmitted data packets, the zero-forcing detector 
will introduce significant errors and the probability of decoding the data packet wrongly will 
be large. The ill-conditioned scenario is a difficult one that requires sophisticated techniques 
to deal with. 

Many different methods have been proposed over the past two decades that aim to achieve, 
with reduced computational complexity, the performance of the optimal detector [|4]|-[|9]|. Most 
of today's state-of-the-art detectors provide the possibility of trading complexity for perfor- 
mance via the choice of some user parameter. One important advantage of such detectors 
is that the tradeoff parameter can be adapted to the effective channel conditions in order 
to improve the overall performance IfTOl . IfTTTl . Amongst these detectors, there are two main 
subcategories. The first consists of detectors that do not have fixed complexity and includes 
in particular methods that perform a reduced tree-search, such as the sphere-decoding (SD) 
aided max-log method and its relatives O-Q. One of the more recent ones is the reduced 
dimension maximum-likelihood search (RD-MLS) of H, 0. Unfortunately, the methods 
in this category have an exponential worst-case complexity unless a suboptimal termination 
criterion is used. The other subcategory of detectors are the ones that have fixed complexity. 
These are much more desirable from an implementation point of view and especially to avoid 
over-dimensioning of the hardware. Examples of such detectors are the soft-output via partial 
marginalization (PM) method and the fixed-complexity SD (FCSD) aided max-log 
method. These fixed-complexity detectors provide a simple and well-defined tradeoff between 
computational complexity and performance, they have a fixed and fully predictable run time, 
and they are highly parallelizable. Note that the FCSD is equivalent to the PM method with 
an additional max-log approximation. 

Summary of Contribution: We propose a new method that is inspired by the ideas in 
0-0 of partitioning the original problem into smaller problems. As in the PM method, 
we perform marginalization over a few of the bits when computing the LLR values. The 
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approximate LLRs that enter the marginalization are much simpler than those in PM, and 
this substantially reduces the complexity of our algorithm which will be clear in Sec. HEH In 
addition to that, we suppress the interference on the considered subspace by performing soft 
interference suppression (SIS). The SIS procedure, which is one of the constituents of our 
algorithm is inspired by the work in [fT2ll - [fT5ll where the core idea behind SIS germinated 
in lf!2ll in a rather different context. The main difference between the SIS procedure in our 
work and that in [fT3l - [[T5l is that we allow for the signal subspace (and the interfering 
subspace) to have varying dimensionality. The additional differences are: (i) we perform the 
SIS in a MIMO setting internally without the need for a priori information from the decoder 
as opposed to [fT3l and (ii) we do not iterate the internal LLR values nor do we ignore 
the correlation between the interfering terms over the different receive antennas as in Ifl4l . 
Ifi31 . We refer to the new method as subspace marginalization with interference suppression 
(SUMIS). The ideas behind SUMIS are fundamentally simple and allow for very simple 
and massively parallelizable algorithmic implementations, which result in extremely low 
complexity detection with surprisingly good accuracy. This method works well for both under- 
and over-determined MIMO systems. We extend our conference paper lfl6l by presenting 
several new features of SUMIS and a detailed complexity analysis. 



II. Preliminaries 
We consider the real-valued MIMO-channel model 

y = Hs + e, (1) 

where H E M. NrxNj is the MIMO channel matrix and s E S Nt is the transmitted vector. 
We assume that S = { — 1,+1} is a binary phase-shift keying (BPSK) constellation, hence 
referring to a "symbol" is equivalent to referring to a "bit". With some extra expense of 
notation, as will be clear later, it is straightforward to extend all results that we present to 
higher order constellations. Further, e E M. Nr ~ A/"(0, ^J) denotes the noise vector and 
y e M- Nr is the received vector. The channel is perfectly known to the receiver unless stated 
otherwise and in what follows, we assume that iV R > N T since this is typical in practice 
and simplifies the mathematics performed in this paper. Note that the SUMIS detector does 
not require N R > N T , but some of the competing methods do; hence this restriction on the 
dimensions. With separable complex symbol constellations, such as quadrature phase-shift 
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keying modulation, every complex-valued model 



y c = H c s c + e c 



e c ~CAf(0 } N I), 



(2) 



where (-) c denotes the complex- valued counterparts of (OQ), can be posed as a real-valued 
model in the form of (QQ) by setting 



y 



Re{y c } 
lm{y c } 



and 



H 



Re {s c } 
Im {s c } 



Re{if c } -Im{Jf c } 
Im {H c } Re {H c } 



Re {e c } 
Im{e c } 



A. Optimal Soft MIMO Detection 

The optimal soft information desired by the channel decoder is the a posteriori log- 
likelihood ratio l(si\y) = log where s« is the i:th bit of the transmitted vector s. 
The quantity l(si\y) tells us how likely it is that the i:th bit of s is equal to minus or plus 
one, respectively. By marginalizing out all the bits except the z:th bit and using Bayes' rule, 
the log-likelihood ratio (LLR) becomes 

'E s -Ms)=+i p ( s \y) 




J2s: Sl (s) = -l P ( S \y) 

E s -Ms)=+iP(y\ s ) p ( s ) 



(3) 



Es: Sl (s) = -lP(y\ S ) P ( S ) 

where the notation Y^s- Si (s)=x means the sum over all possible vectors s G S Nr for which 
the i:th bit is equal to x. If one assumes uniform a priori probabilities, i.e., P(s) = 1/2 Nt , 
the LLR can be written as 

'E s:Si ( S )=+My\ s ) 




s:Si(s)= 



-ip(y\ s ) 



s:s,(s)=+l 



cxp 



N 



\y 



Hs 



sMs )=-i^p[-^\\y 



Hs 




(4) 



In ([3]) and ©, there are 2 Nl terms that need to be evaluated and added. The complexity 
of this task is exponential in N? and this is the main problem in MIMO detection. Thus, 
many approximate methods have been proposed. One very good approximation of © is the 
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so called max-log approximation, 

/ max s:Si(s)=+1 exp ( \\y - Hs\\ 2 

K*i\v) « log )— 2 ^ | , (5) 

ymax s:Si(s)= _i exp \\y - ifs|| J 

where only the largest terms in each sum of © are kept, i.e, the terms having the smallest 
norms in the exponent. Note that even though it is computationally simpler to keep only one 
term in each sum, one needs to search over 2 Nl terms to find the largest ones; hence the 
exponential complexity remains. Nevertheless, with this approximation, one can make any 
hard decision detector, such as SD, to produce soft values. This has resulted in much of the 
literature focusing on finding efficient hard decision methods. Our philosophy is to take a 
different route and instead devise a good approximation of © and © directly. 

In order to explain our proposed method and the competing state-of-the-art methods, for 
fixed n s G {1, . . . , Nj}, we define the following partitioning of the model in (Q]) 

y = Hs + e= [H H] [s T s T ] T + e=Hs+Hs + e, (6) 

col. permut. of H permut. of s 

where H E R N * xn \ H e R iVRX(iVT " ns) , s e S ns contains the z:th bit s { in the original vector 
s, and s E S NT ~ n ". The choice of partitioning involves the choice of a permutation, and how 
to make this choice (for n s > 1) is not obvious. In fact, for each bit in s, there are ( T Zi) 
possible permutations in ©. How we perform this partitioning is explained in Sec. IIII-Cl 
Note that for different detectors, the choice of partitioning serves different purposes. 

B. Today's State-of-the-Art MIMO Detectors 

The PM Method in [8 ]: PM offers a tradeoff between exact and approximate computa- 
tion of ©, via a parameter r = n s -le{0,...,iV T -l}. We present the slightly modified 
version in IfTTl of the original method in which is simpler than that in [8] but without 
comprising performance. The PM method implements a two-step approximation of ©. More 
specifically, in the first step it approximates the sums of © that correspond to s E S NT ~ n:i 
with a maximization, 

f ^maxexpj-^ \\y -Hs-Hsf 2 ^ ^ 



l{si\y) » log 



s:Si(s) = +l 



y~\nax exp ( — || y — Hs — H s\ 

\s: «i(a) = -l 



(7) 
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In the second step, the maximization in © is approximated with a linear filter with quan- 
tization (clipping), such as the zero-forcing with decision-feedback (ZF-DF) detector (S). 
The ZF-DF method is computationally much more efficient than exact maximization, but it 
performs well only for well-conditioned matrices. However, the max problems in © are 
generally well-conditioned since the matrices H are typically tall. For PM, when forming 
the partitioning in ©, the original bit-order in s = [s±, . . . , sat t ] t is permuted in © in a way 
such that the condition number of H is minimized, see flU. Notably, PM performs ZF-DF 
aided max-log detection in the special case of r = and computes the exact LLR values (as 
defined by ©) for r = Nj — 1. 

The FCSD Method in H^j: FCSD essentially performs the same procedure as the PM 
method except that it introduces an additional approximation by employing the max-log 
approximation on the sums in the PM method, i.e., the sums over {s e S ns : s»(s) = x} in 
©. Hence, instead of performing summations over {s 6 S ns : Sj(s) = x} for each x as in 
PM, it picks the best candidate from {s E S ns : s»(s) = x} for each x. 

The RD-MLS Method in /H/, /[7|/: RD-MLS carries out the same procedure as FCSD 
except that it does not perform clipping after the linear filtering and it uses an SD type of 
algorithm to perform a reduced tree-search over {s 6 S Us : Sj(s) = x} for each x. Although 
this method reduces the number of layers in the tree, it does not necessarily improve the 
conditioning of the reduced problem, as the PM and FCSD methods do. This is so due to the 
unquantized linear filtering operation that essentially results in performing a projection of the 
original space (column space of H) onto the orthogonal complement of the column space 
of H . Therefore, for an ill-conditioned matrix H, it is unclear if the RD-MLS algorithm 
would visit significantly fewer branches in the reduced space s than in the original space s. 
Note that both FCSD and RD-MLS are designed for hard detection, which via the max-log 
approximation can produce soft values, as opposed to PM that directly produces soft values. 

III. Proposed Soft MIMO Detector (SUMIS) 

In our proposed method, which we refer to as the subspace marginalization with interfer- 
ence suppression method and abbreviate as SUMIS, there are two main stages. In stage I, 
a first approximation to the a posteriori probability (for BPSK equivalent to the LLR) for 
each bit is computed. In stage II, these approximate LLRs are used in an interference 
suppression mechanism, whereafter the LLR values are calculated based on the resulting 
"purified" model. In this section, we assume that -P(s) is uniform. The case of non-uniform 
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P(s) is treated in Sec. [IV] In what follows, we will present a non-optimized version of 
the SUMIS detector, which is simpler to explain than the optimized version. The latter is 
presented in Sec. IVIII and is the version which a practical implementation would use. 

A. Stage I 

We start with the partitioned model in © 

y=Hs + Hs + e (8) 

interference+noise 

and define an approximate model y ~ y = Hs + n where n is a Gaussian stochastic vector 
A/"(0, Q) with Q = H^fH T +^j-I and ^ is the covariance matrix of s. Under the assumption 
that the symbols are independent, ^ is diagonal, and since P(s) is uniform, * = I. It is 
important to note that y is an approximated model of what we actually receive, namely y. To 
this end, we will use the approximate model y to simplify the probability density functions 
needed for the evaluation of soft values, but we will insert the actual received data y when 
computing them. 

To compute the a posteriori probability P(sk\y) of a bit which is contained in s, we 
can marginalize out the remaining bits in P(s\y). Note that computing P(s\y) itself requires 
marginalizing out s from P(s\y), which is computationally very burdensome. Now, knowing 
that P(s\y) oc p(y\s)P(s) oc p(y\s), the key is to approximate here the function p(y\s) with 
p(y\s), which essentially means that the interfering terms Hs are approximated as Gaussian. 
This is a reasonable approximation since each element in Hs constitutes a sum of variates 
and thus generally has a Gaussian behavior, especially when the variates are independent 
and large in number. The last claim follows from the central limit theorem. Now, with the 
following weighted norm operator ||-||q = (-) T (5 _1 (-), the approximate likelihood function 
is 

p(y\s) = — =exp | — - 111/ — ffsll!, ) (9) 

11 ^ y/(2ir) N *\Q\ V 2 l|y 11 «,/ 

and the a posteriori probability function P(s k \y) can be approximated with the function 

P(s k = s\y) ex P(y\*)- W 

Note that the number of summation terms over s : s k = s is 2 ns_1 , which is significantly 
smaller than the number required to evaluate P(sk\y) exactly. Due to the assumption on 
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S being BPSK, we can perform the marginalization in (flOl) in the LLR domain as (after 
inserting y in place of y) 



h = log I — ^— — £f I , (11) 



J2s:s k =+i ex P (-| Hi/-*** 



FV-II2 

Q , 



E*: Sfc =-i ex P (-| IllZ-ffa 



IQ. 



which can be efficiently computed using the Jacobian logarithm. The a posteriori probabilities 
of the remaining elements in s are approximated analogously to (f8T)- (fTTT) by simply choosing 
different partitionings (permutations) of H and s such that the bit of interest is in s. The 
main purpose of the first stage is to reduce the impact of the interfering term Hs. For 
this purpose, we compute the conditional expected value of bit Sk approximately using the 
function P(sk = s\y), 



Hsk\y} = ^2 sP ( s k = s \v) ~ ^2 sP ( s k = s \vl 

ses ses 
1 1 



y=y 



(12) 



tanh(4f). 



1 + e Xk 1 + e~ x 

This stage is performed for all bits Sk in s, i.e., k — 1, . . . , N T . For higher order constellations, 
this stage would be performed symbol-wise and not bit-wise as it is presented here. This is 
the reason for using the index k and not the index i in this stage. 

B. Stage II: Purification 

For each bit s i5 the interfering vector s in © is suppressed using 

y' = y -HE{s\y} =Hs+H(s - E{s\y}) + e ^Hs + n', (13) 

interference+noise 

where n' ~ jV(0, Q') with Q' =H®H and <fr is the conditional covanance matrix 

of s. Hence, under the approximation that the elements in s are independent conditioned on 
y, we have that 

$ = E{diag(5) 2 |y} - E{dmg(~s)\y} 2 , (14) 



where the operator diag(-) takes a vector of elements as input and returns a diagonal 
matrix with these elements on its diagonal, and the operator (-) 2 performs (•)(•)• Since 
S = {-1,+1}, we get 

$ = I- diag(E{%}) 2 . 
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After the interfering vector s is suppressed and the model is "purified", we compute the 
LLRs. The LLRs are computed by performing a full-blown marginalization in © over 
the corresponding n s -dimensional subspace s using the purified approximate model in (fl"3l) . 
Hence, the LLR value we compute for the i:th bit is 

u (T,- s - Ms)=+ ^p(-l\\y'-Hs\\ 2 Q ,)\ 

Z(si|y)«log f ( ■ (15) 

\Es M s)=-i ex P (-2 \\y' - h s\\q> )} 

For higher order constellations, this stage is performed bit-wise; hence the index i. 



C. Choosing the Permutations 

The optimal permutation would be the one that minimizes the probability of decoding 
error and this permutation is hard to find. There are many ways to choose the permutation 
via heuristic arguments. We aim to choose the partitioning such that for a bit s fc in s, the 
interfering vector s in © has as little effect on s as possible. This essentially means that 
the inner product between the columns in H and those in H should be as small as possible. 
Therefore, we base our partitioning on H T H, which has the structure 

<y\ pi, 2 
H T H = p 1>2 cr| 

and we pick for a column or row k in H T H (corresponding to bit k in s) the n s — 1 indices 
that correspond to the largest values \pk,e\- Then, these indexes along with the index k specify 
the columns from H that are placed in H . The rest of the columns are placed in H . 



D. Summary 

The steps of the SUMIS algorithm are summarized in Alg. Q] in the form of generic 
pseudo-code. Via the adjustable subspace dimensionality, i.e., the n s -parameter, our method 
provides a simple and well-defined tradeoff between computational complexity and detection 
performance. For n s = Nj, there is no interfering vector s and SUMIS performs exact LLR 
computation. For n s = 1, SUMIS becomes the soft MMSE method with the additional step 
of model purification. 
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Algorithm 1 Subspace Marginalization with Interference Suppression (SUMIS) 

Start with some H, y and n s G {1, . . . , N T } 

for k — 1, . . . , Nj do// - First stage - // 

Decide upon a partitioning in © based on H T H 

Calculate \ k in (fTTT) (cond. probability of s k in terms of LLR) 

Calculate E{s k \y} and Var {s k \y} = 1 - E{s fc |y} 2 in CLU) 

end for 

for each bit in s do// - Second stage - // 

Suppress the interfering vector s and calculate y' in (fT3l) 
Calculate the new covariance matrix Q' 
Calculate the LLR in CES) 

end for 



IV. Non-Uniform A Priori Probabilities 

The algorithm in Sec. [In] can be directly extended to the case of non-uniform P(s). The 
details are given as follows. 

A. Stage I 

Since we have a priori information on the symbols, we can purify the model already in 
this stage and suppress the interfering subspace s. First, we evaluate the expectation value 



and the purified received data 

y~HE{s}=Hs+H(s~E{s}) + e, (16) 

" v ' 

interference+noise 

where the "interference+noise" is, as in © in the approximate model y = Hs + n, 
approximated to be M(0, Q) where now ^ in Q is not necessarily equal to the identity 
matrix. More precisely, under the restriction S = {— 1, +1} and the assumption that the bits 
s k are independent, we get 

* = I - diag(E{5}) 2 . 

We can approximate the a posteriori probability function P(s k = s\y), analogously to (fTOl) . 
with 

P(s k = s\y) oc P(y\s)P(s). 

s:Sk=s 
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Using this probability function, we can approximate the expectation of s k conditioned on y 
in the same manner as in (PT2l) . i.e., 



E{s k \y} ~ tanh (f ), X k = log = ±M ) 



(17) 

y=y 



Similarly to stage I in Sec. |ni]for higher order constellations, this stage is performed symbol- 
wise; hence we use here again the index k. 

B. Stage II 

In this stage, exactly the same procedure is performed as in stage II in Sec. [Ill] with two 
minor modifications: first the model is purified using (PT71) instead of (fl"2l) and then the LLR 
value of the i:th bit is computed using 

^ : , ( ^ +1 exp(-i||^-g,||^)P(^ 
Z(si|y)«log ) - ( . (18) 

\X^.)~iw{-h\\v'-B4 Q ,)p{8)) 

For higher order constellations, this stage is performed bit-wise; hence the index i. 

V. Imperfect Channel- State Information 

Here, we address an important circumstance that often arises in practice, namely when the 
receiver does not have perfect knowledge about H. In practice, the receiver then typically 
forms an estimate of the channel, based on a known transmitted pilot matrix s 1:NjR = 
[si . . . s Ntr ] and the corresponding received matrix y 1:JVrR = [y 1 . . . 2/jvtr]- The so-obtained 
channel matrix will not be perfectly accurate, and the uncertainty involved will need to be 
modeled and taken into account when computing the soft bit-values (LLRs). 

What we have used previously to compute the soft values is P(s\y) where the full 
knowledge on H was implicit, and may be written out explicitly as P(s\y,H). But now, 
we do not have access to P(s\y, H), instead, we have access to P(s\y, y 1]NrTR , s 1]Ntr ). To 
compute the soft values in this case, we would, analogously to ©, like to marginalize over 
all the bits in s except for the one of interest. Similarly to ©, this is a computationally 
burdensome task. Therefore, we next extend the computationally light procedure of SUMIS 
to take into account channel estimation errors, i.e., using P(s\y, y 1:NjR , s 1:N tr) instead of just 
plugging in H (an estimate of H) instead of H itself into P(s\y, H) in Sec. [Till Detection 
using the latter approach is typically referred to as mismatched detection. Note that as H is 
a function of y 1:N ™ and s 1:NjR , the errors in the soft decisions based on P(s\y,H) cannot 
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be smaller than those based on P(s\y, y 1:iYTR ; s 1:iYxR ), assuming that the receiver has no extra 
information aside from the training data when evaluating H. Under some circumstances 
though, such as those that include H being the MMSE estimator of H, they are equivalent 
d. 

Here we present two different extensions of SUMIS using imperfect CSI and in both of 
them, we will model the channel estimate H, when obtained using the training data s 1:7Vtr 

and y 1:JV ™, by 

H = H + A, (19) 

where H is the true channel and A is the estimation error matrix whose complex counterpart 
A c in © has independent CJ\f(0, 5 2 ) elements where 5 2 is the estimation-error variance per 
complex dimension. Here, we further assume that the elements in H are independent of 
those in A; one example of such a setting is when MMSE channel estimation is used |fT9l . 

A. Approach 1 

The goal in this approach is to devise a matched version of SUMIS for constellations that 
have constant modulus, i.e., satisfy say ||s|| 2 = Nj. Recall that P(s\y, H) = p(y\s,H)P(s)/p(y), 
for which we aim to approximate p(y\s,H) using the philosophy of SUMIS. From (QQ) and 
(fT9l) . we have 




y = Hs + e = Hs + e - As = Hs + e. (20) 



For constellations that have constant modulus, we know that e ~ JV(0, NtS + n ° I). In this 
case, it is clear that the SUMIS algorithm can be applied directly. In the first stage of SUMIS, 
partition y = Hs +Hs + e, define y = Hs + n where n ~ Af(0,H^H T + N ^ 2 + N ° i), 
and then approximate p(y\s,H) with p(y\s,H). 



B. Approach 2 

Here, we deal with a more complicated case, namely when the signal constellation does not 

1 1 2 

have constant ||s|| . The difficulty arises due to the noise term e typically not being Gaussian 
in such cases. Therefore, we cannot directly apply the above procedure. Nevertheless, since 
we are interested only in p(y\s,H), we could approximate e conditioned on s and H as 
Gaussian. Why the Gaussian approximation on e|_ ^ is reasonable follows from the same 
argument as in Sec. |inj Namely, each element in e consists of a sum of variates, which 
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are in addition independent due to the assumptions made on A. The co variance matrix of e 
conditioned on s and H is ( E ^ S H HH ) 5 +N ° j_ Thus, the power of the effective noise is 
(e{||s|| }+\\s\\ )s +n | nstea( j Q f Wo as - n g £C jjjjj ^ problem that arises is that this power now 
depends on E{||s|| 2 } and s. This causes the complexity of SUMIS to increase substantially 
since the inverse of Q must be recomputed for each permutation in © and even more often 
for each s; the same applies in the second stage of SUMIS for the Q' matrix. 

To avoid this complexity increase, we introduce further approximations. First, instead of 
E{||s|| 2 }, we use 



all 2V T 
permuts. 



where the sum is taken over all N T permutations considered in SUMIS for a particular H. 
This is reasonable since N T ^> n s and at most n s elements out of iV T — n s in s are replaced 
from one permutation to another; hence, E{||s|| 2 } will not differ much over the permutations. 

II — 1 1 2 

Second, using again that N T ^> n s , the variations in ||s|| will have a minor affect on the 
absolute power of the effective noise. Therefore, we replace ||s|| 2 by jJh^ ^2ses n " II* 



2 



n s . 



Hence, the simplified noise power becomes ^ n+n ^ +Af ° ; which is constant for each stage and 
results in a SUMIS complexity equivalent to that of the full CSI approach, i.e., the complexity 
presented in Sec. IVIII 

VI. Very Large MIMO Settings 

Previous research that addresses MIMO settings with a large number of antennas (both 
transmit and receive) [3 J has focused on hard decision algorithms. That is, the aim was to 
find efficient methods that achieve the same performance as that of the max-log method. As 
we will see in the numerical results, a better approach is to focus on the approximation of 
the exact LLR method as the performance loss of the max-log method to begin with is larger 
for larger MIMO systems. Yet another interesting conclusion is that, sophisticated branching 
algorithms are not required as the soft MMSE method achieves the performance of the exact 
LLR method for sufficiently large number of transmit antennas. This claim follows from the 
central limit theorem and the following argument. Consider the model in © for n s = 1, 
which makes H a column vector, say hj, and s a scalar, say s^; hence, y = hiSi + Hs + e. 
We assume that the elements in s are independent, which is a very common assumption in 
the literature and a very reasonable assumption in practice as many coded communication 
links include interleaving of the coded bits. By the central limit theorem, the received vector 
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y for a given converges in distribution to N (hiSi,Hi$>H T +^l) , as we let iV T grow. This 
suggests that the exact LLR function L(si\y) = \og(p(y\si = +1)) — \og(p(y\s,i = — 1)) will 
for increasing N? approach the LLR function based on the Gaussian distribution, i.e., 

~ ~ ~ T Nn 

L( Si \y) w 4y T (H*H +^-I)~ 1 h h for N T > 1. (21) 

This is a simple but important observation that has not been emphasized fully in earlier 
literature, and especially not in the work considering large MIMO settings. Even though (|21T) 
applies only for sufficiently large iV T , the question is how close the right-hand side is to the 
left-hand side in (|2TT) in terms of frame-error rate for large but finite iV T . We have investigated 
this for Nj = 26 in our simulation results, which indicate that the performance of the soft 
MMSE method is much closer to that of the exact LLR method compared to in cases of 
smaller systems such as N T = 12. What is especially interesting for the SUMIS method 
is that the performance gap between the soft MMSE method and the exact LLR method is 
reduced remarkably via the procedure in stage II of SUMIS. Recall that the SUMIS method 
performs for n s = 1 the soft MMSE procedure in stage I. 

VII. Complexity: Optimized SUMIS and Soft MMSE 

We next identify the main complexity bottlenecks of SUMIS step by step and keep track of 
the largest order of magnitude terms. The focus will be on the algorithm presented in Sec. ITTTl 
how it can be optimized and how many operations it requires. Note that the techniques 
presented in what follows can also be used for an optimized implementation of the soft 
MMSE method Il20l . The complexity will be measured in terms of elementary operations 
bundled together: additions, subtractions, multiplications, and divisions. The complexity count 
is divided into two parts: a received data independent (y-independent) processing part and 
an y-dependent processing part. We will also assume that n s < ]V T < N R , which is the case 
of most practical interest. The assumption on N T < N R is required only for the upcoming 
optimized SUMIS version, due to the requirement for various inverses to exist, but analogous 
complexity reductions can be made for N T > N R and are excluded due to space limitations. 

A. y -independent processing 

We start with the choice of permutations in Sec. IIII-CI The SUMIS algorithm uses N T 
different permutations that are decided based on H T H. This procedure evaluates H T H 
requiring N T 2 N R operations followed by a search for each permutation requiring n s N T 
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\y-Hsf Q = (y-Hs) T Q\y-Hs) = y T Q 1 y-2y T Q 1 Hs + s T H T Q l Hs 



{{H T Q ^H) l H T Q l y - s) T H T Q l H((H T Q l H) l H T Q l y - s) 

(22) 



+ V T Q 1 y - y T Q l H (H T Q l H) X H T Q ^y 



H Q » 1 =H {^1 + Hm ) r 1 = * A *jf l qx + Hm 



2 



apply matrix inversion lemma (23a) 



2 7 / v 2 

= (I-H T {^I + H*JET T )- 1 S*)~ 1 JS rT (^U r + HVH^ 1 
H T Q l H IP (I-H T (^I + H*H T )- 1 H¥)- 1 H T (!*>I + H*H T ) l H 



(23b) 



H T Q 1 H)~ l H T Q~ l ^ (H T (^I + H*H T )- 1 B)- 1 B T {*>I + HVH T ) 1 (24a) 



# T (^I + H^H T ) 1 = P T H T {^I + HVH T ) 1 = P\ l ^H T {^I + H^H T ) 1 



apply matrix inversion lemma 



pVY^#-i + H T H) l H T 



(24b) 



comparisons. Thus, the choice of permutations requires altogether NrNt +Nj;n s Nr ~ NrNt 2 
operations. 

Next, by simple matrix manipulations, one can pre-process and simplify the computation 
of (flOl) . in the y-dependent part of the algorithm, consisting of a sum of terms in © over 
all s. Consider again ©, 

p(s|5)= V(2^IQ| exp ("^ l|S " gS|1 ^)' 

which includes matrix-vector multiplications of dimension iV R . We can rewrite the exponent 
as in (|22l) where the terms on the last line in (1221) do not depend on s and will not affect 
the final result in CGI. So, from we see that if R T Q l H and {R T Q l H) l R T Q 1 
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are precomputed, the matrix-vector multiplications in © in the ^-dependent part will be of 
dimension n s <C N R , which is evidently desirable. 

We need to evaluate these matrices once for each partitioning (all N T of them), and this 
we can do in one shot. For this purpose, we derive the identities in ([23]) and (1241) where we 

rp— -— T~~~T 

have defined ^ and ^ to be diagonal matrices such that H^fH = H^$>H , and 

P E {0, l} JVrXn > t0 |, e a ma trix that has precisely n s ones such that H = HP (a column 
picking matrix). Recall that ^ is the co variance matrix of s. For n s = 1, the identity (I23al) 
(also mentioned in Il20l . ETTl ) is well known from the equivalence showed in |[22l exerc. 8.18] 
of the MMSE filter (H sec. 3.2.1]. Equation (l24b~l) was also derived in [20] and Ell though 
the derivations there contain a minor error. Specifically, the assumption on UU T = I in the 
singular value decomposition of H = f/EF in I12T1 app. A.2.2] is not valid for N R > N T . 
Now, since we have established (I23bl) and (|24bl) . we can immediately write 



where the innermost inverse is of dimension iV T and the two outermost inversions are of 
dimension n s . Focusing on the innermost inverse, it has been observed in ll20ll that the matrix 
(JMr 1 + H T H) can be numerically unstable to invert. The reason is that some (diagonal) 
values in ^ can be very small. This was addressed in EOl by writing *fr\^p$r 1 -\- H T H) 1 = 
I + H T H^) \ which is a more stable inverse but due to the lost symmetry property 
requires much more operations ll23l . We want to facilitate the use of efficient algorithms 
available for inversion of symmetric matrices [|23l but without having to deal with unstable 
inversions. Therefore, we instead write \f r ~ 1 (^ 1, 3r 1 + H 7 !!) 1 = + ^H T H^) W 

where (^i +W?H H^ 2 ) is symmetric and stable to invert. Note that the computation of ^ 2 
and i s simple, even though it necessitates square root evaluations, since ^ is diagonal 
with positive values. 

There are several different approaches to inverting a positive definite matrix. Some are 
more stable than others and some require less operations than others. One very fast and stable 
approach is through the LDL-decomposition [23, p. 139], i.e., (^J + ^H T H^) = LDL T , 
where L is a lower-triangular matrix with ones on its diagonal and D is a diagonal matrix 
with positive diagonal elements. The LDL-decomposition itself requires iV T 3 /3 operations 
Il23l p. 139], and so does the inversion of L and D together. Hence, (1251) becomes 




(25) 
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for which the number of operations for all partitionings can be summarized: 
• LDL-decomposition (iV T 3 /3), 

. V T ET L ^ (iVr 3 /3), 

. (P T * -k T Lr L ^(H T H)P) 2n 2 N T 2 . 
The remaining evaluations consist of inverses of matrices of very small dimension n s for 
which there exist closed form formulas that require a negligible number of operations. Thus, 
the total number of operations required to compute H 7 ^ 1 !! explicitly for all partitionings 
is N T 3 . 

B. y-dependent processing 

We need to compute, for all partitionings, 

-T x i (26) 

x P * -a£ T Lr x L ^H T y, 

where only P T ^~^L~ T D~ l L~ 1 ^^H 1 y needs to be evaluated since the leftmost (inverse) matrix 
in (1261) is of dimension n s and has already been computed in the ^/-independent part. Note 

rp 1 rp 1 1 1 rp 

that the computation of H y and subsequently W LK L *$>?H y requires 2N R N T and 
2Nt 2 operations, respectively. Hence, to compute (H 7 ^ 1 !! )~tl Cf x y for all partitionings 
requires 2N r Nt + 2iV T 2 operations. 

To compute (flOl) . for each Sk, requires n s 2 2 ns operations since the exponents in © consist 
of matrix- vector multiplications of dimension n s . The remaining bottleneck is the computation 

rp 

of the updated co variance matrix H Q^ 1 !! and the update of y to y' in (fl"3T) . The number 
of operations required for H^Q^ 1 !! is iVr 3 , analogously to the computation of H T Q^ l H . For 
the update in (fl3T> . we have that y' = y - HE{s\y} = y - HE{s\y} + HE{s\y}, which 
after the transformation in (1261) using the updated matrix Q' instead of Q becomes 

(H T Q^H) B T Q^y' 
= (H T Ct J H)H T Ct\y - HE{s\y}) + E{s\y} 
= E{s\y} + ((P%-\^r l + H T H) l H T HP) 1 

xpV^r^ H T H)~\H T y - H T HE{s\y})Y (27) 
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Det. method 


^/-independent 


^/-dependent 


SUMIS (proposed) 


iV R iV + N T * 


Nj 3 


PM 


N R N T 2 + N T 3 


N T 3 M n > 


soft MMSE 


N R N T 2 + N T 3 


< N T 3 


SD aided max-log 


3N R Nj 2 


variable 


max-log 


NjM Nj 


N r M N * 


exact LLR 


NjM Nt 


NjM Nt 



Tab. 1: Complexity summary. We introduce M = \S\ here as the constellation cardinality. 
Note that the presented complexity counts are valid for sufficiently large N T . More specifically 
for SUMIS, it is valid for M n < < N T 2 . 

From the discussion after (1261) . we can conclude that (1271) requires 4iV T 2 operations for all 
partitionings. Lastly, the LLR computation of each bit requires n 2 2 ns operations. 

C. Summary 

Under the assumption that N R > N T and that no a priori knowledge of s is available, the 
^/-independent part of the algorithm requires roughly N R N T 2 + N T 3 operations, which is the 
same amount of operations required by the soft MMSE algorithm. As for the ^/-dependent 
part, the number of operations required is roughly iV T 3 . Thus, the total number of operations 
required by the SUMIS detector to evaluate all LLRs associated with one received vector y 
is 

iV R jV T 2 + 2iV T 3 . 

Comparing the SUMIS complexity to other detectors, see Tab. [T] such as PM that requires 
approximately iV T 3 2 ns operations in the ^/-dependent part only, and max-log (via SD) which 
requires 3N r Nt 2 in the initialization stage only, we can see that SUMIS provides clear 
complexity savings. These savings come, as we will soon see, with significant performance 
gains over these competing methods. 

VIII. Numerical Results 

A. Simulation Setup 

Using Monte Carlo simulations we evaluate the performance of our proposed method 
and the competing ones in terms of frame-error rate (FER) as a function of the normalized 
signal-to-noise ratio 1/N . To make the results statistically reliable, we count 300 frame 
errors for each simulated point of 1/Nq. We simulate 6x6 and 13 x 13 complex MIMO 
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systems with M 2 -ary quadrature amplitude modulation (M 2 -QAM) modulation having unit 
average energy per real-valued symbol where M G {2,4}, which means that the detection is 
performed on real-valued 12 x 12 and 26 x 26 MIMO systems with Af-ary pulse amplitude 
modulation (M-PAM). The channel is chosen to be Rayleigh fading where each complex- 
valued matrix element is independently sampled from £/V(0, log 2 (M 2 ) x rate). We use three 
different irregular low-density parity-check (LDPC) codes with rates {1/4,1/2,3/4} each 
having a codeword length of 10000 bits. Two different coherence times are used: slow fading 
(each codeword sees one channel realization) and fast fading (each codeword spans 40 channel 
matrices), respectively. We plot the FER curves of the exact LLR (as defined by dU)), the max- 
log approximation Q, SUMIS for n s = 1,3, SUMIS stage I only (without the purification 
procedure) for n s = 1, 3, and PM for n s = r + 1 = 3 (S). We include the approximate LLRs 
computed in SUMIS stage I only in order to show the performance gain of the channel model 
purification, which is stage II of SUMIS. The convention used in the figures that follow is 
that dashed lines represent the proposed methods and the solid lines represent the competing 
ones. Recall that SUMIS stage I only, for n s = 1, is equivalent to the soft MMSE method. 
Since the FCSD method is an approximation of the PM method, we refrain from plotting 
its performance curves. We also omit performance plots of RD-MLS or any other SD based 
method due to the fact that they approximate the max-log method (can thus not perform 
better) and that they have a varying complexity lower-bounded by the complexity of the 
sorted QR-decomposition lf24H, |[25ll, see Tab. [IJ 

B. Results 

We simulate in Fig. Q] the slow-fading 6x6 MIMO system with 4-QAM and with the 
LDPC code of rate 1/2. There is no iteration between the detector and the decoder, and 
the transmitted symbols are assumed to be uniformly distributed. This plot illustrates our 
principal comparison and the rest (Figs. [2H2]) illustrate extended comparisons that deal with 
different scenarios of interest: slow-/fast-fading, moderate/large size MIMO, low/high rate 
codes, full/partial CSI, higher order constellations, and iterative/non-iterative receivers. The 
setting in Fig. Q] includes all the above mentioned detection methods whereas the remaining 
figures include only those methods that show noteworthy variations from what is already 
seen in Fig. Q] 

The results in Fig. \T\ clearly indicate that the SUMIS detector performs close to the exact 
LLR (optimal soft detector) performance. It outperforms the PM and the max-log method, 



July 16, 2012 



DRAFT 



20 



e 0.1 



0.01 




-7 



-6.5 



-6 



-5.5 



-5 

1/N dB 



-4.5 



-4 



-3.5 



-3 



Fig. 1: FER as a function of 1/N for the slow-fading 6x6 MEMO system (N R — N T — 12 
in O) with 4-QAM (2-PAM in ©) and with the LDPC code of rate 1/2. The shown 
performance curves are: (i) dashed curves for the SUMIS stage I only and the complete 
SUMIS procedure with n s = 1 and n s = 3 spanning from right to left, and (ii) solid curves 
for the exact LLR method, the max-log method, and the PM method with n s = r + 1 = 3. 

and it does so at a much lower complexity, see Tab. Q] Note that the complexity of SUMIS 
with n s = 3 is much lower than that of PM with n s = r + 1 = 3 even though the partitioned 
problem in © is of the same size. The reason is that the sums of the PM method consists 
of terms whose exponents require the evaluation of matrix-vector multiplications of much 
larger dimension than in SUMIS. The complexity versus performance tradeoff of SUMIS, 
both the complete and stage I only variant, is clear. The larger n s we use the better is the 
performance of both variants. 

In Fig. [2l we simulate under the same setting as in Fig.Q]except that instead of the code with 
rate 1/2, we plot for rates 1/4 and 3/4. This plot suggest that there is a larger, but still very 
small, performance gap between SUMIS and exact LLR for higher coding rates. Especially, 
the max-log curve is much closer to the exact LLR curve for higher rates than for lower rates. 
The high-rate scenario shows clearly the importance of the model "purification" procedure 
(SUMIS stage II) as the performance gap is significant between SUMIS and SUMIS stage 
I only. For the low coding rate scenario, the performance gap between SUMIS and exact 
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Fig. 2: Same as in Fig. [T]but with LDPC codes of rate 1/4 (black curves) and rate 3/4 
(gray curves). Note that the left-most blended curves are five different curves: exact LLR, 
SUMIS for n s = 1, 3, and SUMIS stage I only for n s = 1, 3. 

LLR is negligible. Similar results were observed for short convolution codes with codeword 
length of 100 bits, but these plots are not included due to space limitations. 

A large MIMO system is simulated in Fig. [3] where we use the same setting as in Fig. \T\ 
but with a 13 x 13 complex- valued MIMO system instead of 6 x 6. The purpose is to 
show how close the soft MMSE (SUMIS stage I only for n s = 1) and SUMIS in particular 
are to the exact LLR performance curve. Plots that include the exact LLR curve for large 
MIMO systems are to our knowledge not available in the literature, probably because of 
the humongous complexity and simulation time required to evaluate the optimal curve. We 
used a highly optimized version of the exact LLR detector and it took approximately 50000 
core-hours to evaluate the curves in Fig. [3j A very interesting observation is that the max-log 
curve has a bigger gap to the exact LLR curve than in Fig. \T\ This is the curve that various 
SD algorithms and in particular the TABU search algorithm for large MIMO systems Q aim 
to achieve. As predicted in Sec. [VT], we see in Fig. [3] that the soft MMSE is much closer to the 
exact LLR curve than in Fig. [TJ Also, the performance loss of soft MMSE is compensated 
by the channel model purification in SUMIS. The performance of SUMIS is impressive, 
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Fig. 3: Same as in Fig. Q]but for a 13 x 13 MIMO system instead of 6 x 6. Note in this figure 
compared to Fig. [TJ the increasing gap between max-log and exact LLR, and the decreasing 
gap between soft MMSE (SUMIS stage I only for n s = 1) and exact LLR. The SUMIS curve 
is very close to the exact LLR curve. 

both with stage I only and with stage II included, which suggests that approximating the 
exact LLR expression directly (which is the philosophy of SUMIS) is a better approach than 
max-log. 

The fast-fading scenario is simulated in Fig.Sl i.e., the same setting as in Fig. Q] except that 
each codeword spans 40 channel realizations instead of only one. Apart from the PM method 
performing worse as compared to in Fig. [H the curves are similar. As expected, simple soft 
MMSE (SUMIS stage I only for n s = 1) performs much better when the channel varies often 
than when it stays constant over a whole codeword. The reason is that ill-conditioned channel 
matrices, which are the main reason for poor MMSE performance, have less impact as they 
are less likely to occur and only small portions of a transmitted codeword are affected. 

The purpose of the example in Fig. [5l where we use the same setting as in Fig. Q] but 
with 16-QAM instead of 4-QAM, is to quantify how SUMIS performs for higher-order 
constellations. As the results in Fig. [5] show, the performance of SUMIS is very good and in 
particular in relation to its complexity. For instance for the ^/-dependent part, SUMIS, PM, 
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Fig. 4: Same as in Fig. Q] but with fast-fading, i.e., a codeword spans 40 independent channel 
realizations. 

and exact LLR require 12 3 , 12 3 * 4 3 , and 12 * 4 12 operations, respectively. This suggest a 
speedup of SUMIS of 64 and 10 5 times relative to PM and exact LLR, respectively. 

Yet another important scenario that often occurs in practice is detection under imperfect 
CSI (ICSI). This we plot in Fig. [6a] and [6b] where we use the same setting as in Fig. \T\ and 
Fig. S respectively, but now with only knowledge of H instead of H. The error-matrix- 
element variance S 2 we use is directly proportional to the noise variance No, i.e., 5 2 = aN 
where a is a constant. We use matched and mismatched ICSI detection. By mismatched, 
we mean detection that uses P(s\y, H)\ ^, i.e., that uses an estimate H instead of H 
without taking into account the implications of having only H instead of H . By contrast, 
we speak of matched ICSI detection when using P(s\y,H) according to the discussion 
in Sec. [V] Note in Fig. [6b] that only the SUMIS curve uses the matched version whereas 
the remaining curves use the mismatched versions. The reason is that not many algorithms 
provide the possibility for matched ICSI detection when the symbol constellation, such as 
16-QAM, does not have constant modulus. For instance, the PM algorithm is not capable of 
providing this. It is evident that the matched versions outperform the mismatched versions 
in Fig. [6] The results in Fig. [6a] resemble those in Fig. Q] with only a minor shift in signal- 
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Fig. 5: Same as in Fig. Q] but for 16-QAM (4-PAM in ©) instead of 4-QAM. The exact 
LLR curve has been excluded due to the massive complexity required to evaluate the FER. 
Its complexity is of the same order of magnitude as that in Fig. [3] 

to-noise ratio. This comes as no surprise as the effective channel model in (1201) for BPSK 
per real dimension (as in Fig. [6a]) is equivalent to (OQ) (as in Fig. [T]) up to a scaling of 
the noise variance. More interestingly, the unprecedented performance of SUMIS is further 
confirmed in Fig. [6b] where it is clear for higher order QAM constellations that the matched 
SUMIS curve has similar performance as the mismatched max-log curve and outperforms 
the remaining ones. 

The last example that we present illustrates a setting with iterative decoding where the 
detector and the decoder interchange information, referred to as soft-input soft-output (SISO) 
decoding. This we plot in Fig. [7] where we use the same iterative decoding setup as in [fT3l 
Fig. 1] under the same simulation environment as in Fig. \T\ We can again see that SUMIS, 
as presented in Sec. [IV] shows striking performance at very low complexity. 

IX. Conclusions 

We have proposed a novel soft MIMO detection method, SUMIS, that outperforms today's 
state-of-the-art detectors, runs at fixed-complexity, provides a clear and well-defined tradeoff 
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(a) Same as Fig. Q]but with imperfect CSI. The matched SUMIS version used here is presented in Sec. IV-AI 
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(b) Same as Fig. [5] but with imperfect CSI. The matched SUMIS version used here is presented in Sec. IV-BI 

Fig. 6: An example with imperfect CSI at the receiver. The error-matrix-element variance 5 2 
we use is directly proportional to the noise variance No. The matched detectors use P(s\y, H) 
and the mismatched use P(s\y, H)\ H -pt- 
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Fig. 7: Detection with iterative decoding using the same setting as in Fig. [TJ We use the 
exact LLR curve with no SISO iterations as a reference curve, marked with gray color. The 
rest, which are colored black, are the same as in Fig. Q] but here after 2 SISO iterations. The 
dashed lines represent the extended SUMIS algorithm as presented in Sec. [IV] 

between computational complexity and detection performance, and is highly parallelizable. 
The ideas behind it are fundamentally simple and allow for very simple algorithmic imple- 
mentations. The proposed method has a complexity that is of the same order of magnitude 
as the linear methods. We have conducted a thorough numerical evaluation of our proposed 
method to quantify and to validate the performance comparing to today's state-of-the-art 
methods. Our simulation results indicate that SUMIS (for low n s ) outperforms in many cases 
the max-log method and inherently all other methods that approximate max-log such as SD. 
This conclusion applies for the comparison with the PM method as well. It is remarkable 
that this performance is achieved with a complexity that is much smaller than the competing 
ones, see Tab. \T\ 

More fundamentally, these results indicate that approximating the exact LLR expression 
directly (which is the philosophy of SUMIS) is a better approach than max-log. This is 
especially vivid for larger MIMO systems where the performance gap between the max-log 
approximation and the exact LLR seems to increase. Yet another very important conclusion 
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is that the hardware-implementation aspect of SUMIS has remarkable advantages [20j over 
the branching type of algorithms due to the hardware friendly algebraic operations used by 
SUMIS, without the need for IF- statements. Implementation of IF-statements are known not 
to be particularly hardware friendly. The SUMIS detector opens the door for a whole new 
class of detectors that can be utilized in the future. 
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